home *** CD-ROM | disk | FTP | other *** search
- Subject: v08i076: Soundex spelling-checker, Part01/02
- Newsgroups: mod.sources
- Approved: mirror!rs
-
- Submitted by: Barry Brachman <brachman@cs.ubc.cdn>
- Mod.sources: Volume 8, Issue 76
- Archive-name: sp/Part01
-
-
- Sp is a soundex-based dictionary search program that uses the
- dbm routines. Included are Makefiles, man pages, and an Mlisp interface.
-
- Barry Brachman
- Dept. of Computer Science
- Univ. of British Columbia
- Vancouver, B.C. V6T 1W5
-
- .. {ihnp4!alberta, uw-beaver}!ubc-vision!ubc-cs!brachman
- brachman@cs.ubc.cdn
- brachman%ubc.csnet@csnet-relay.arpa
- brachman@ubc.csnet
-
- #! /bin/sh
- # This is a shell archive. Remove anything before this line,
- # then unpack it by saving it in a file and typing "sh file".
- # If all goes well, you will see the message "End of archive 1 (of 2)."
- # Contents: README Makefile Makefile.newdbm calcsoundex.c dbm.bug
- # dbm.diffs dbmstuff.c misc.c mksp.c sp.1 MANIFEST
- PATH=/bin:/usr/bin:/usr/ucb; export PATH
- echo shar: extracting "'README'" '(5017 characters)'
- if test -f 'README' ; then
- echo shar: will not over-write existing file "'README'"
- else
- sed 's/^X//' >README <<'@//E*O*F README//'
- X
- XHere are a pair of programs that might be of some use to those who have
- Xtrouble with spelling.
- X
- XThe first program, sp, accepts your tentative or approximate
- Xspelling of a word as input and produces a list of words.
- XIf the correct spelling of the word appears in one of the dictionaries used,
- Xit is likely that it appears in the output list.
- XNote that this is different from the UNIX 'spell' command that
- Xtells you which words in a document do not appear in the dictionary.
- X
- XThe second program, mksp, lets you maintain your own dictionary of troublesome
- Xwords.
- X
- X=====
- XTo run sp you'll need:
- X - the Unix dbm routines, old or new (4.3BSD)
- X
- XNot required, but very useful:
- X - the source to the old dbm routines if you don't have the new ones
- X or your dbm routines don't have dbmclose() (check your man page for
- X dbm(3X) to see if you've got dbmclose())
- X - /usr/dict/words plus any other large list of words you might have
- X
- X=====
- XI apologize for the complexity of the following guide. It is due to the
- Xpossibility of 4 different dbm configurations: 4.3 style dbm, Sun style dbm
- Xwith the dbmclose() routine, "old" (4.2BSD/V7) dbm with source and without
- Xsource.
- X
- X1. The program assumes that a char is 8 bits and an int is at least 16 bits.
- X I've avoided using shorts.
- X
- X2. Note the following if you are using the old dbm routines that *don't* have
- X dbmclose():
- X The "old" dbm routines that don't have dbmclose() don't work properly if you
- X do more than one dbminit(). If you have source code, you can apply the
- X diffs so that multiple dbminit() calls will work, allowing
- X multiple dictionaries to be used by sp, although you can still only access
- X one dbm at a time. If you do not have source then you can still use
- X sp/mksp except you must change MAXDICT (in sp.h) to 1 and edit
- X Makefile.newdbm as indicated there. You will only be able to use one
- X dictionary. I'm including a bug report that came off the net for the old
- X dbm routines. This bug patch has been included in dbm.diffs but is
- X surrounded by #ifdef BUGFIX.
- X
- X If you're applying the patches to the old dbm code, make a copy of dbm.c
- X and dbm.h. Apply the patches by:
- X patch < dbm.diffs
- X or by hand (Larry Wall's patch program is in the mod.sources archive).
- X
- X3. Note the following if you are using the old dbm routines that *do* have
- X dbmclose() (e.g., Sun 2 and Sun 3):
- X Edit Makefile.newdbm and uncomment the two lines indicated. Make using
- X Makefile.newdbm (see below).
- X
- X4. Check sp.h and adjust for local conditions. You might also edit sp.1
- X to reflect your local configuration.
- X
- X5. I've tried to make it easy to change the key used for retrieving from
- X the dbm. The routines to make and disassemble a key are in misc.c.
- X I want to keep the key as small as possible since dictionaries tend to
- X be rather large. I've used a vector of unsigned chars for the key because
- X I didn't want to have to deal with various lengths of shorts and ints on
- X different hardware.
- X
- X6. If you are using the "new" dbm routines (e.g., those in 4.3BSD that allow
- X multiple simultaneously open dbm's), if you have dbmclose(), or if you have
- X the old dbm routines without the dbm source then do:
- X make -f Makefile.newdbm
- X otherwise do:
- X make
- X
- X Then move sp, mksp, and calcsoundex to a public directory. Copy sp.1 to
- X where you keep man pages for such programs (you might also link mksp.1 and
- X calcsoundex.1 to sp.1).
- X
- X7. If you are using Gosling EMACS, copy sp.ml (the MLISP interface to sp) into
- X a public EMACS library. I haven't tried to convert sp.ml to work with
- X gnuemacs. Put the documentation (sp.9) where appropriate on your system
- X (you may need to edit the FILES section).
- X
- X8. You should create a public library using /usr/dict/words, e.g.:
- X mksp -a -v /usr/public/lib/sp.dict < /usr/dict/words
- X The path of this dictionary should appear in DEFAULT_SPPATH (sp.h). Users
- X should be made aware of the public version so they don't make their own copy.
- X
- X9. dbm doesn't seem to work between a Sun and VAX across NFS. Too bad.
- X (It does work between Sun's.)
- X Use rsh with the dictionary list on the command line.
- X
- X10. The programs have been tested on Sun 3/160 (4.2BSD 3.0), VAX 750 (4.3BSD),
- X using both the new and old dbm routines.
- X
- X11. I have a dictionary of 35K words (350Kb) that do not appear in
- X /usr/dict/words. The only way I have of circulating it is on a
- X double-sided Atari ST or Mac disk (single-sided if ARC'ed). If you are
- X interested send me a message. Perhaps it could be archived somewhere
- X (any volunteers?).
- X
- X12. Reference: Knuth, D.E. The Art of Computer Programming, Volume 3/Sorting
- X and Searching, 1973, pp.391-392.
- X
- X13. If you find any bugs please notify me rather than posting to the net.
- X
- XEnjoy!
- X
- X-----
- XBarry Brachman
- XDept. of Computer Science
- XUniv. of British Columbia
- XVancouver, B.C. V6T 1W5
- X
- X.. {ihnp4!alberta, uw-beaver}!ubc-vision!ubc-cs!brachman
- Xbrachman@cs.ubc.cdn
- Xbrachman%ubc.csnet@csnet-relay.arpa
- Xbrachman@ubc.csnet
- X
- @//E*O*F README//
- if test 5017 -ne "`wc -c <'README'`"; then
- echo shar: error transmitting "'README'" '(should have been 5017 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'Makefile'" '(482 characters)'
- if test -f 'Makefile' ; then
- echo shar: will not over-write existing file "'Makefile'"
- else
- sed 's/^X//' >Makefile <<'@//E*O*F Makefile//'
- X
- X# Makefile for systems using modified dbm routines
- X
- XCFLAGS=-O
- X
- Xall: sp mksp
- X
- Xsp: sp.o dbmstuff.o dbm.o misc.o
- X cc ${CFLAGS} -o sp sp.o dbmstuff.o misc.o dbm.o
- X
- Xmksp: mksp.o dbmstuff.o dbm.o misc.o
- X cc ${CFLAGS} -o mksp mksp.o dbmstuff.o misc.o dbm.o
- X
- Xcalcsoundex: calcsoundex.c
- X cc ${CFLAGS} -o calcsoundex calcsoundex.c
- X
- X# define BUGFIX if you want the fix included
- Xdbm.o: dbm.h dbm.c
- X cc -c ${CFLAGS} dbm.c
- X
- X.c.o:
- X cc -c ${CFLAGS} $?
- X
- Xclean:
- X rm -f sp.o mksp.o dbmstuff.o dbm.o
- X
- @//E*O*F Makefile//
- if test 482 -ne "`wc -c <'Makefile'`"; then
- echo shar: error transmitting "'Makefile'" '(should have been 482 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'Makefile.newdbm'" '(850 characters)'
- if test -f 'Makefile.newdbm' ; then
- echo shar: will not over-write existing file "'Makefile.newdbm'"
- else
- sed 's/^X//' >Makefile.newdbm <<'@//E*O*F Makefile.newdbm//'
- X
- X# Makefile for systems with the new dbm routines (e.g., 4.3BSD systems),
- X# those with a dbm library containing dbmclose() (e.g., Sun 2 and Sun 3),
- X# and those with the old dbm without source
- X
- XCFLAGS=-O -DNEWDBM
- XLIB=
- X
- X# If you are using the old dbm routines with dbmclose(), uncomment
- X# the following two lines (otherwise comment them out)
- X# CFLAGS=-O -DHAS_CLOSE
- X# LIB=-ldbm
- X
- X# If you are using the old dbm routines without dbmclose(), uncomment
- X# the following two lines (otherwise comment them out)
- X# CFLAGS=-O
- X# LIB=-ldbm
- X
- Xall: sp mksp
- X
- Xsp: sp.o dbmstuff.o misc.o sp.h
- X cc ${CFLAGS} -o sp sp.o dbmstuff.o misc.o ${LIB}
- X
- Xmksp: mksp.o dbmstuff.o sp.h
- X cc ${CFLAGS} -o mksp mksp.o dbmstuff.o misc.o ${LIB}
- X
- Xcalcsoundex: calcsoundex.c
- X cc ${CFLAGS} -o calcsoundex calcsoundex.c
- X
- X.c.o:
- X cc -c ${CFLAGS} $?
- X
- Xclean:
- X rm -f sp.o mksp.o dbmstuff.o dbm.o
- X
- @//E*O*F Makefile.newdbm//
- if test 850 -ne "`wc -c <'Makefile.newdbm'`"; then
- echo shar: error transmitting "'Makefile.newdbm'" '(should have been 850 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'calcsoundex.c'" '(2455 characters)'
- if test -f 'calcsoundex.c' ; then
- echo shar: will not over-write existing file "'calcsoundex.c'"
- else
- sed 's/^X//' >calcsoundex.c <<'@//E*O*F calcsoundex.c//'
- X/* vi: set tabstop=4 : */
- X
- X/*
- X * calcsoundex - calculate soundex codes
- X *
- X * Permission is given to copy or distribute this program provided you
- X * do not remove this header or make money off of the program.
- X *
- X * Please send comments and suggestions to:
- X * Barry Brachman
- X * Dept. of Computer Science
- X * Univ. of British Columbia
- X * Vancouver, B.C. V6T 1W5
- X *
- X * .. {ihnp4!alberta, uw-beaver}!ubc-vision!ubc-cs!brachman
- X * brachman@cs.ubc.cdn
- X * brachman%ubc.csnet@csnet-relay.arpa
- X * brachman@ubc.csnet
- X */
- X
- X#include <stdio.h>
- X#include <ctype.h>
- X
- X#include "sp.h"
- X
- Xchar word[MAXWORDLEN + 2];
- X
- Xchar soundex_code_map[26] = {
- X/*** A B C D E F G H I J K L M N O P ***/
- X 0, 1, 2, 3, 0, 1, 2, 0, 0, 2, 2, 4, 5, 5, 0, 1,
- X
- X/*** Q R S T U V W X Y Z ***/
- X 2, 6, 2, 3, 0, 1, 0, 2, 0, 2
- X};
- X
- Xmain(argc, argv)
- Xint argc;
- Xchar **argv;
- X{
- X register int c, i, soundex_length, digit_part, previous_code;
- X int ch, len, vflag;
- X short soundex;
- X char *gets();
- X
- X vflag = 0;
- X if (argc > 2 || (argc == 2 && strcmp(argv[1], "-v"))) {
- X fprintf(stderr, "Usage: calcsoundex [-v]\n");
- X exit(1);
- X }
- X if (argc > 1)
- X vflag = 1;
- X
- X while (fgets(word, sizeof(word), stdin) != (char *) NULL) {
- X len = strlen(word);
- X if (word[len - 1] != '\n') {
- X fprintf(stderr, "calcsoundex: Word too long: %s", word);
- X while ((ch = getchar()) != '\n') /* flush rest of line */
- X putc(ch, stderr);
- X putc('\n', stderr);
- X continue;
- X }
- X word[--len] = '\0';
- X if (len > MAXWORDLEN) {
- X fprintf(stderr, "calcsoundex: Word too long: %s\n", word);
- X continue;
- X }
- X
- X for (i = 0; word[i] != '\0'; i++) {
- X if (isupper(word[i]))
- X word[i] = tolower(word[i]);
- X }
- X if (!isalpha(word[0]))
- X continue;
- X
- X digit_part = 0;
- X soundex_length = 0;
- X previous_code = soundex_code_map[word[0] - 'a'];
- X for (i = 1; word[i] != '\0' && soundex_length < 3; i++) {
- X if (!isalpha(word[i]))
- X continue;
- X c = soundex_code_map[word[i] - 'a'];
- X if (c == 0 || previous_code == c) {
- X previous_code = c;
- X continue;
- X }
- X digit_part = digit_part * 10 + c;
- X previous_code = c;
- X soundex_length++;
- X }
- X while (soundex_length++ < 3)
- X digit_part *= 10;
- X soundex = digit_part << 5 + word[0] - 'a';
- X printf("%c", word[0]);
- X if (digit_part < 100)
- X putchar('0');
- X if (digit_part < 10)
- X putchar('0');
- X if (digit_part == 0)
- X putchar('0');
- X else
- X printf("%d", digit_part);
- X if (vflag)
- X printf(" %s", word);
- X putchar('\n');
- X }
- X putchar('\n');
- X exit(0);
- X}
- X
- @//E*O*F calcsoundex.c//
- if test 2455 -ne "`wc -c <'calcsoundex.c'`"; then
- echo shar: error transmitting "'calcsoundex.c'" '(should have been 2455 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'dbm.bug'" '(1839 characters)'
- if test -f 'dbm.bug' ; then
- echo shar: will not over-write existing file "'dbm.bug'"
- else
- sed 's/^X//' >dbm.bug <<'@//E*O*F dbm.bug//'
- XArticle 770 of net.bugs.4bsd:
- XPath: ubc-cs!ubc-ean!alberta!ihnp4!mhuxn!mhuxr!ulysses!allegra!mit-eddie!genrad!panda!talcott!harvard!seismo!elsie!ado
- XFrom: ado@elsie.UUCP (Arthur David Olson)
- XSubject: 4.?bsd dbm's store(k,c) dies if (i=k.dsize+c.dsize)==1018||i==1019--FIX
- XDate: Wed, 10-Apr-85 09:02:36 PST
- XDate-Received: Thu, 11-Apr-85 13:03:49 PST
- XOrganization: NIH-LEC, Bethesda, MD
- X
- XIndex: lib/libdbm/dbm.c Fix
- X
- XDescription:
- X 4.?bsd dbm's store function misbehaves if the sum of the key data
- X size and content data size is either 1018 or 1019.
- X
- XRepeat-By:
- X Compile this program with the "dbm" library:
- X
- X typedef struct {
- X char * dptr;
- X int dsize;
- X } datum;
- X
- X char buf[1024];
- X
- X main(argc, argv)
- X int argc;
- X char * argv[];
- X {
- X int result;
- X datum key;
- X datum content;
- X
- X key.dptr = content.dptr = buf;
- X key.dsize = atoi(argv[1]);
- X content.dsize = 0;
- X creat("fake.dir", 0600);
- X creat("fake.pag", 0600);
- X dbminit("fake");
- X result = store(key, content);
- X printf("%d\n", result);
- X }
- X
- X Then run the program. If you use commands such as
- X a.out 0
- X a.out 1
- X ...
- X a.out 1017
- X things go swimmingly. If you use commands such as
- X a.out 1019
- X a.out 1020
- X ...
- X an error message is (correctly) produced. But if you use either
- X the command
- X a.out 1018
- X or
- X a.out 1019
- X things go wild.
- X
- XFix:
- X As usual, the trade secret status of the code involved precludes a
- X clearer posting. The fix is to change one line in "dbm.c"; it
- X causes an error message to be produced in the 1018/1019 cases:
- X
- X #ifdef OLDVERSION
- X if(key.dsize+dat.dsize+2*sizeof(short) >= PBLKSIZ) {
- X #else
- X if(key.dsize+dat.dsize+3*sizeof(short) >= PBLKSIZ) {
- X #endif
- X--
- XBugs is a Warner Brothers trademark
- X--
- X UUCP: ..decvax!seismo!elsie!ado ARPA: elsie!ado@seismo.ARPA
- X DEC, VAX and Elsie are Digital Equipment and Borden trademarks
- X
- X
- @//E*O*F dbm.bug//
- if test 1839 -ne "`wc -c <'dbm.bug'`"; then
- echo shar: error transmitting "'dbm.bug'" '(should have been 1839 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'dbm.diffs'" '(2748 characters)'
- if test -f 'dbm.diffs' ; then
- echo shar: will not over-write existing file "'dbm.diffs'"
- else
- sed 's/^X//' >dbm.diffs <<'@//E*O*F dbm.diffs//'
- XIndex: dbm.c
- X*** dbm.orig.c Thu Nov 27 17:45:38 1986
- X--- dbm.c Thu Nov 27 18:13:11 1986
- X***************
- X*** 6,16 ****
- X--- 6,21 ----
- X #include <sys/types.h>
- X #include <sys/stat.h>
- X
- X+ static long dbm_access_oldb;
- X+ static getbit_oldb;
- X+
- X dbminit(file)
- X char *file;
- X {
- X struct stat statb;
- X
- X+ dbm_access_oldb = -1;
- X+ getbit_oldb = -1;
- X dbrdonly = 0;
- X strcpy(pagbuf, file);
- X strcat(pagbuf, ".pag");
- X***************
- X*** 27,36 ****
- X dirf = open(pagbuf, 0);
- X dbrdonly = 1;
- X }
- X! if(pagf < 0 || dirf < 0) {
- X! printf("cannot open database %s\n", file);
- X return(-1);
- X- }
- X fstat(dirf, &statb);
- X maxbno = statb.st_size*BYTESIZ-1;
- X return(0);
- X--- 32,39 ----
- X dirf = open(pagbuf, 0);
- X dbrdonly = 1;
- X }
- X! if(pagf < 0 || dirf < 0)
- X return(-1);
- X fstat(dirf, &statb);
- X maxbno = statb.st_size*BYTESIZ-1;
- X return(0);
- X***************
- X*** 130,136 ****
- X--- 133,143 ----
- X return (0);
- X
- X split:
- X+ #ifdef BUGFIX
- X+ if(key.dsize+dat.dsize+3*sizeof(short) >= PBLKSIZ) {
- X+ #else
- X if(key.dsize+dat.dsize+2*sizeof(short) >= PBLKSIZ) {
- X+ #endif
- X printf("entry too big\n");
- X return (-1);
- X }
- X***************
- X*** 226,232 ****
- X dbm_access(hash)
- X long hash;
- X {
- X! static long oldb = -1;
- X
- X for(hmask=0;; hmask=(hmask<<1)+1) {
- X blkno = hash & hmask;
- X--- 233,239 ----
- X dbm_access(hash)
- X long hash;
- X {
- X! /*** static long oldb = -1; ***/
- X
- X for(hmask=0;; hmask=(hmask<<1)+1) {
- X blkno = hash & hmask;
- X***************
- X*** 234,245 ****
- X if(getbit() == 0)
- X break;
- X }
- X! if(blkno != oldb) {
- X clrbuf(pagbuf, PBLKSIZ);
- X lseek(pagf, blkno*PBLKSIZ, 0);
- X read(pagf, pagbuf, PBLKSIZ);
- X chkblk(pagbuf);
- X! oldb = blkno;
- X }
- X }
- X
- X--- 241,252 ----
- X if(getbit() == 0)
- X break;
- X }
- X! if(blkno != dbm_access_oldb) {
- X clrbuf(pagbuf, PBLKSIZ);
- X lseek(pagf, blkno*PBLKSIZ, 0);
- X read(pagf, pagbuf, PBLKSIZ);
- X chkblk(pagbuf);
- X! dbm_access_oldb = blkno;
- X }
- X }
- X
- X***************
- X*** 247,253 ****
- X {
- X long bn;
- X register b, i, n;
- X! static oldb = -1;
- X
- X if(bitno > maxbno)
- X return(0);
- X--- 254,260 ----
- X {
- X long bn;
- X register b, i, n;
- X! /*** static oldb = -1; ***/
- X
- X if(bitno > maxbno)
- X return(0);
- X***************
- X*** 255,265 ****
- X bn = bitno / BYTESIZ;
- X i = bn % DBLKSIZ;
- X b = bn / DBLKSIZ;
- X! if(b != oldb) {
- X clrbuf(dirbuf, DBLKSIZ);
- X lseek(dirf, (long)b*DBLKSIZ, 0);
- X read(dirf, dirbuf, DBLKSIZ);
- X! oldb = b;
- X }
- X if(dirbuf[i] & (1<<n))
- X return(1);
- X--- 262,272 ----
- X bn = bitno / BYTESIZ;
- X i = bn % DBLKSIZ;
- X b = bn / DBLKSIZ;
- X! if(b != getbit_oldb) {
- X clrbuf(dirbuf, DBLKSIZ);
- X lseek(dirf, (long)b*DBLKSIZ, 0);
- X read(dirf, dirbuf, DBLKSIZ);
- X! getbit_oldb = b;
- X }
- X if(dirbuf[i] & (1<<n))
- X return(1);
- @//E*O*F dbm.diffs//
- if test 2748 -ne "`wc -c <'dbm.diffs'`"; then
- echo shar: error transmitting "'dbm.diffs'" '(should have been 2748 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'dbmstuff.c'" '(1595 characters)'
- if test -f 'dbmstuff.c' ; then
- echo shar: will not over-write existing file "'dbmstuff.c'"
- else
- sed 's/^X//' >dbmstuff.c <<'@//E*O*F dbmstuff.c//'
- X/* dbmstuff.c */
- X
- X/*
- X * Interface to old and new dbm routines
- X */
- X
- X#include <stdio.h>
- X
- X#ifndef NEWDBM
- X
- X#include <dbm.h>
- X
- X/*ARGSUSED*/
- XDBMINIT(path, flags)
- Xchar *path;
- Xint flags;
- X{
- X
- X return(dbminit(path));
- X}
- X
- XDBMCLOSE()
- X{
- X
- X#ifdef HAS_CLOSE
- X dbmclose();
- X#else
- X close(3); /* free up the file descriptors */
- X close(4);
- X#endif
- X}
- X
- Xdatum
- XFETCH(key)
- Xdatum key;
- X{
- X datum fetch();
- X
- X return(fetch(key));
- X}
- X
- Xdatum
- XFIRSTKEY()
- X{
- X
- X return(firstkey());
- X}
- X
- Xdatum
- XNEXTKEY(key)
- Xdatum key;
- X{
- X return(nextkey(key));
- X}
- X
- XSTORE(key, content)
- Xdatum key, content;
- X{
- X
- X return(store(key, content));
- X}
- X
- XREPLACE(key, content)
- Xdatum key, content;
- X{
- X
- X if (delete(key) == -1)
- X return(-1);
- X return(store(key, content));
- X}
- X
- XDELETE(key)
- Xdatum key;
- X{
- X
- X return(delete(key));
- X}
- X
- X#endif !NEWDBM
- X
- X#ifdef NEWDBM
- X
- X#include <ndbm.h>
- X
- Xstatic DBM *current_db = (DBM *) NULL;
- X
- XDBMINIT(path, flags)
- Xchar *path;
- Xint flags;
- X{
- X
- X current_db = dbm_open(path, flags, 0);
- X if (current_db == (DBM *) NULL)
- X return(-1);
- X return(0);
- X}
- X
- XDBMCLOSE()
- X{
- X
- X if (current_db != (DBM *) NULL) {
- X dbm_close(current_db);
- X current_db = (DBM *) NULL;
- X }
- X}
- X
- Xdatum
- XFETCH(key)
- Xdatum key;
- X{
- X
- X return(dbm_fetch(current_db, key));
- X}
- X
- Xdatum
- XFIRSTKEY()
- X{
- X
- X return(dbm_firstkey(current_db));
- X}
- X
- X/*ARGSUSED*/
- Xdatum
- XNEXTKEY(key)
- Xdatum key;
- X{
- X
- X return(dbm_nextkey(current_db));
- X}
- X
- XREPLACE(key, content)
- Xdatum key, content;
- X{
- X
- X return(dbm_store(current_db, key, content, DBM_REPLACE));
- X}
- X
- XSTORE(key, content)
- Xdatum key, content;
- X{
- X
- X return(dbm_store(current_db, key, content, DBM_INSERT));
- X}
- X
- XDELETE(key)
- Xdatum key;
- X{
- X
- X return(dbm_delete(current_db, key));
- X}
- X
- X#endif NEWDBM
- @//E*O*F dbmstuff.c//
- if test 1595 -ne "`wc -c <'dbmstuff.c'`"; then
- echo shar: error transmitting "'dbmstuff.c'" '(should have been 1595 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'misc.c'" '(3643 characters)'
- if test -f 'misc.c' ; then
- echo shar: will not over-write existing file "'misc.c'"
- else
- sed 's/^X//' >misc.c <<'@//E*O*F misc.c//'
- X/* misc.c */
- X
- X/* vi: set tabstop=4 : */
- X
- X#include <ctype.h>
- X#include <stdio.h>
- X
- X#include "sp.h"
- X
- X/*
- X * Special character map that determines what the second character of a word
- X * can be; see sp.h
- X * May be expanded to contain up to 12 entries plus the terminating entry
- X * Must end with an entry of two null bytes
- X */
- Xstruct spchar_map spchar_map[] = {
- X '\'', QUOTE_CHAR,
- X '&', AMPER_CHAR,
- X '.', PERIOD_CHAR,
- X ' ', SPACE_CHAR,
- X '\0', '\0'
- X};
- X
- Xmk_key(key, soundex, count)
- Xkey_t *key;
- Xint soundex;
- Xint count;
- X{
- X
- X key[0] = soundex & 0377;
- X key[1] = ((soundex & 037400) >> 8) | ((count & 03) << 6);
- X key[2] = (count & 01774) >> 2;
- X#ifdef DEBUG
- X if (ex_soundex(key) != soundex)
- X fprintf(stderr, "mk_key: soundex failed\n");
- X if (ex_count(key) != count)
- X fprintf(stderr, "mk_key: count failed\n");
- X#endif DEBUG
- X}
- X
- Xex_soundex(key)
- Xkey_t *key;
- X{
- X register int soundex;
- X
- X soundex = key[0] & 0377;
- X soundex |= (key[1] & 077) << 8;
- X return(soundex);
- X}
- X
- Xex_count(key)
- Xkey_t *key;
- X{
- X register int count;
- X
- X count = (key[1] & 0300) >> 6;
- X count |= ((key[2] & 0377) << 2);
- X return(count);
- X}
- X
- X/*
- Xex_char(key)
- Xkey_t *key;
- X{
- X int ch;
- X
- X ch = (key[1] & 076) >> 1;
- X return(ch + 'a');
- X}
- X*/
- X
- X/*
- X * Unpack a word given the retrieved word of length len and its soundex
- X * Extract the first letter from the soundex code
- X * If the length is 1 and if it is marked as a single character word
- X * then the marked character will be overlaid with a null
- X * otherwise a null will be appended to the string
- X * Adjust for upper case leading character if necessary
- X * Return address of the copy
- X */
- Xchar *
- Xmk_word(p, len, s)
- Xchar *p;
- Xint len, s;
- X{
- X register char *q, ch;
- X static char word[MAXWORDLEN + 2];
- X
- X q = word;
- X if (len == 1 && (*p & SINGLE_CHAR)) {
- X *(q + 1) = '\0';
- X len = 0;
- X }
- X else
- X *(q + len + 1) = '\0';
- X
- X /*
- X * Extract the first character from the soundex and
- X * adjust case
- X */
- X if (*p & UPPER_CHAR)
- X ch = (s & 037) + 'A';
- X else
- X ch = (s & 037) + 'a';
- X *q++ = ch;
- X
- X if (len != 0) { /* if more than one char adjust second char */
- X ch = *p & MASK_CHAR;
- X if (ch < 26)
- X ch += 'a';
- X else if (ch < 52)
- X ch = ch - 26 + 'A';
- X else if ((ch = fromspchar(ch)) == '\0') {
- X fprintf(stderr, "Bogus second char in mk_word\n");
- X exit(1);
- X }
- X *q++ = ch;
- X p++;
- X len--;
- X }
- X
- X while (len-- > 0)
- X *q++ = *p++;
- X return(word);
- X}
- X
- X/*
- X * Convert the second character of a word to a special character code
- X * Return null if there is no mapping
- X */
- Xtospchar(ch)
- Xchar ch;
- X{
- X register struct spchar_map *m;
- X
- X for (m = spchar_map; m->spchar != '\0'; m++)
- X if (ch == m->spchar)
- X break;
- X return(m->code);
- X}
- X
- X/*
- X * Convert from the special character code to the ASCII code
- X * Return null if there is no mapping
- X */
- Xfromspchar(ch)
- Xchar ch;
- X{
- X register struct spchar_map *m;
- X
- X for (m = spchar_map; m->spchar != '\0'; m++)
- X if (ch == m->code)
- X break;
- X return(m->spchar);
- X}
- X
- X/*
- X * Compare two strings, independent of case, given their lengths
- X */
- X/*
- Xstrnmatch(str1, len1, str2, len2)
- Xchar *str1, *str2;
- Xint len1, len2;
- X{
- X register char ch1, ch2;
- X
- X if (len1 != len2)
- X return(0);
- X while (len1-- > 0) {
- X ch1 = *str1++;
- X ch2 = *str2++;
- X if (ch1 != ch2) {
- X if (isupper(ch1))
- X ch1 = tolower(ch1);
- X if (isupper(ch2))
- X ch2 = tolower(ch2);
- X if (ch1 != ch2)
- X return(0);
- X }
- X }
- X return(1);
- X}
- X*/
- X
- X/*
- X * Compare two strings, independent of case
- X */
- Xstrmatch(p, q)
- Xchar *p, *q;
- X{
- X register char ch1, ch2;
- X
- X while (1) {
- X ch1 = *p++;
- X ch2 = *q++;
- X if (ch1 == '\0' || ch2 == '\0')
- X break;
- X if (ch1 != ch2) {
- X if (isupper(ch1))
- X ch1 = tolower(ch1);
- X if (isupper(ch2))
- X ch2 = tolower(ch2);
- X if (ch1 != ch2)
- X break;
- X }
- X }
- X return(ch1 - ch2);
- X}
- X
- @//E*O*F misc.c//
- if test 3643 -ne "`wc -c <'misc.c'`"; then
- echo shar: error transmitting "'misc.c'" '(should have been 3643 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'mksp.c'" '(12797 characters)'
- if test -f 'mksp.c' ; then
- echo shar: will not over-write existing file "'mksp.c'"
- else
- sed 's/^X//' >mksp.c <<'@//E*O*F mksp.c//'
- X/* vi: set tabstop=4 : */
- X
- X/*
- X * mksp - make soundex dictionary
- X * Version 1.3, December 1986
- X *
- X * If <soundexfile.{dir, pag}> do not exist, try to create them
- X * If they do exist and are not empty, then words will be added
- X * from the standard input
- X * Only when words are being added to an existing database are duplicate words
- X * ignored
- X * Valid words (words beginning with an alphabetic) are stored as given but
- X * comparisons for duplicates is case independent.
- X * Non-alphabetic characters are ignored in computing the soundex
- X *
- X * Permission is given to copy or distribute this program provided you
- X * do not remove this header or make money off of the program.
- X *
- X * Please send comments and suggestions to:
- X * Barry Brachman
- X * Dept. of Computer Science
- X * Univ. of British Columbia
- X * Vancouver, B.C. V6T 1W5
- X *
- X * .. {ihnp4!alberta, uw-beaver}!ubc-vision!ubc-cs!brachman
- X * brachman@cs.ubc.cdn
- X * brachman%ubc.csnet@csnet-relay.arpa
- X * brachman@ubc.csnet
- X */
- X
- X#include <ctype.h>
- X#include <errno.h>
- X#include <sys/types.h>
- X#include <sys/file.h>
- X#include <sys/stat.h>
- X#include <stdio.h>
- X
- X#ifdef NEWDBM
- X#include <ndbm.h>
- X#include <sys/file.h>
- X#else !NEWDBM
- X#include <dbm.h>
- X#endif NEWDBM
- X
- X#include "sp.h"
- X
- X#define NEW_DICT 0
- X#define OLD_DICT 1
- X
- X#define VFREQ 1000 /* frequency for verbose messages */
- X
- X#define streq(X, Y) (!strcmp(X, Y))
- X#define strneq(X, Y, N) (!strncmp(X, Y, N))
- X
- X#define USAGE "Usage: mksp -tad [-v[#]] <soundexfile>"
- X
- Xint map[26][667];
- X
- Xdatum FETCH(), FIRSTKEY(), NEXTKEY();
- X
- Xkey_t keyvec[KEYSIZE];
- Xkey_t *key = keyvec;
- X
- X/*
- X * Soundex codes
- X * The program depends upon the numbers zero through six being used
- X * but this can easily be changed
- X */
- Xchar soundex_code_map[26] = {
- X/*** A B C D E F G H I J K L M N O P ***/
- X 0, 1, 2, 3, 0, 1, 2, 0, 0, 2, 2, 4, 5, 5, 0, 1,
- X
- X/*** Q R S T U V W X Y Z ***/
- X 2, 6, 2, 3, 0, 1, 0, 2, 0, 2
- X};
- X
- Xint digit_part;
- X
- Xint aflag, dflag, tflag, vflag;
- X
- Xchar *mk_word();
- X
- Xmain(argc, argv)
- Xint argc;
- Xchar **argv;
- X{
- X register int i;
- X char *file;
- X
- X if (argc != 3 && argc != 4) {
- X fprintf(stderr, "%s\n", USAGE);
- X exit(1);
- X }
- X aflag = dflag = tflag = vflag = 0;
- X file = (char *) NULL;
- X
- X for (i = 1; i < argc; i++) {
- X if (streq(argv[i], "-a"))
- X aflag = 1;
- X else if (streq(argv[i], "-d"))
- X dflag = 1;
- X else if (streq(argv[i], "-t"))
- X tflag = 1;
- X else if (strneq(argv[i], "-v", 2)) {
- X if (isdigit(argv[i][2]))
- X vflag = atoi(&argv[i][2]);
- X else
- X vflag = 1;
- X }
- X else if (file == (char *) NULL)
- X file = argv[i];
- X else {
- X fprintf(stderr, "%s\n", USAGE);
- X exit(1);
- X }
- X }
- X
- X if (file == (char *) NULL || (tflag + aflag + dflag) != 1) {
- X fprintf(stderr, "%s\n", USAGE);
- X exit(1);
- X }
- X
- X if (aflag) {
- X addwords(file);
- X if (vflag > 1) {
- X register int j, total;
- X int m, max;
- X
- X fprintf(stderr, "Counters:\n");
- X for (i = 0; i < 26; i++) {
- X total = max = map[i][0];
- X for (j = 1; j < 667; j++) {
- X total += (m = map[i][j]);
- X if (m > max)
- X max = m;
- X }
- X if (max > 0)
- X fprintf(stderr, "%c: max %d total %d\n", 'a'+i, max, total);
- X }
- X }
- X }
- X else if (dflag)
- X deletewords(file);
- X else if (tflag)
- X prcontents(file);
- X exit(0);
- X}
- X
- X/*
- X * Add words read from stdin to the database
- X * The key is the 3 digit soundex code for the word plus a disambiguating
- X * counter. Different counter values are used for words with the same soundex
- X * code. The maximum counter value is MAXCOUNT. If the counter overflows then
- X * we lose, but given at least 8 bits this seems unlikely.
- X */
- Xaddwords(name)
- Xchar *name;
- X{
- X register int c, count, delete, duplicate, i, len;
- X register int *p;
- X int ch, s, status;
- X char wcopy[MAXWORDLEN + 2], word[MAXWORDLEN + 2];
- X datum dbm_key, dbm_content;
- X
- X status = setup(name);
- X if (DBMINIT(name, O_RDWR) == -1) {
- X fprintf(stderr, "mksp: Can't initialize\n");
- X exit(1);
- X }
- X
- X for (i = 0; i < 26; i++)
- X for (c = 0; c < 667; c++)
- X map[i][c] = 0;
- X
- X dbm_key.dptr = (char *) key;
- X dbm_key.dsize = KEYSIZE;
- X
- X count = 0;
- X
- X while (fgets(word, sizeof(word), stdin) != (char *) NULL) {
- X len = strlen(word);
- X if (word[len - 1] != '\n') {
- X fprintf(stderr, "mksp: Word too long: %s", word);
- X while ((ch = getchar()) != '\n') /* flush rest of line */
- X putc(ch, stderr);
- X putc('\n', stderr);
- X continue;
- X }
- X word[--len] = '\0';
- X if (len > MAXWORDLEN) {
- X fprintf(stderr, "mksp: Word too long: %s\n", word);
- X continue;
- X }
- X
- X if ((s = soundex(word, 3)) == BAD_WORD) {
- X if (vflag)
- X fprintf(stderr, "Ignoring bad word: %s\n", word);
- X continue;
- X }
- X ch = (isupper(word[0]) ? tolower(word[0]) : word[0]) - 'a';
- X p = &(map[ch][digit_part]);
- X
- X /*
- X * If words are being added to an old dictionary,
- X * check for duplication and watch for a deleted entry
- X * The reason for only checking for duplicates in old dictionaries is
- X * that usually when you're creating a new dictionary the words are
- X * already sorted and unique and the creation of a large dictionary is
- X * slow enough already.
- X */
- X duplicate = 0;
- X delete = -1; /* an 'impossible' counter */
- X if (status == OLD_DICT) {
- X c = 0;
- X while (1) {
- X mk_key(key, s, c);
- X dbm_content = FETCH(dbm_key);
- X if (dbm_content.dptr == 0)
- X break;
- X
- X if (!IS_DELETED(dbm_content)) {
- X char *str;
- X
- X str = mk_word(dbm_content.dptr, dbm_content.dsize, s);
- X if (strmatch(word, str) == 0) {
- X duplicate = 1;
- X if (vflag)
- X fprintf(stderr, "duplicate: %s\n", word);
- X break;
- X }
- X }
- X else if (delete < 0) /* choose delete nearest front */
- X delete = c;
- X
- X if (++c > MAXCOUNT) {
- X fprintf(stderr, "mksp: Counter overflow\n");
- X fprintf(stderr, "soundex: %c%d\n", ch+'a', b10(digit_part));
- X exit(1);
- X }
- X }
- X if (duplicate)
- X continue;
- X *p = c;
- X }
- X if (*p > MAXCOUNT) {
- X fprintf(stderr, "mksp: Counter overflow\n");
- X fprintf(stderr, "soundex: %c%d\n", ch+'a', b10(digit_part));
- X exit(1);
- X }
- X mk_key(key, s, *p);
- X *p = *p + 1;
- X strcpy(wcopy, word);
- X if (len == 1) {
- X if (isupper(wcopy[0]))
- X wcopy[0] |= UPPER_CHAR;
- X wcopy[0] |= SINGLE_CHAR;
- X dbm_content.dptr = wcopy;
- X }
- X else {
- X if (isupper(wcopy[1]))
- X wcopy[1] = wcopy[1] - 'A' + 26;
- X else if (islower(wcopy[1]))
- X wcopy[1] = wcopy[1] - 'a';
- X else if ((wcopy[1] = tospchar(wcopy[1])) == '\0') {
- X fprintf(stderr, "Bogus second char: can't happen!\n");
- X exit(1);
- X }
- X if (isupper(wcopy[0]))
- X wcopy[1] = wcopy[1] | UPPER_CHAR;
- X dbm_content.dptr = wcopy + 1;
- X len--;
- X }
- X dbm_content.dsize = len; /* null not stored */
- X if (delete < 0) {
- X if (STORE(dbm_key, dbm_content) == -1) {
- X fprintf(stderr, "mksp: Can't store\n");
- X exit(1);
- X }
- X }
- X else {
- X if (vflag)
- X fprintf(stderr, "reusing: %s\n", word);
- X mk_key(key, s, delete);
- X if (REPLACE(dbm_key, dbm_content) == -1) {
- X fprintf(stderr, "mksp: Can't replace\n");
- X exit(1);
- X }
- X }
- X count++;
- X if (vflag > 1)
- X fprintf(stderr, "%5d: %s(%d)\n", count, word, ex_count(key));
- X if (vflag && (count % VFREQ) == 0)
- X fprintf(stderr, "%5d: %s\n", count, word);
- X }
- X if (vflag)
- X fprintf(stderr, "%d words\n", count);
- X DBMCLOSE();
- X}
- X
- X/*
- X * Print out everything
- X */
- Xprcontents(name)
- Xchar *name;
- X{
- X register int s;
- X datum dbm_key, dbm_content;
- X
- X if (DBMINIT(name, O_RDONLY) == -1)
- X exit(1);
- X
- X dbm_key = FIRSTKEY();
- X while (dbm_key.dptr != NULL) {
- X dbm_content = FETCH(dbm_key);
- X if (dbm_content.dptr == 0)
- X break; /* ??? */
- X
- X if (vflag)
- X printf("%3d. ", ex_count((key_t *) dbm_key.dptr));
- X if (IS_DELETED(dbm_content)) {
- X if (vflag)
- X printf("(deleted)\n");
- X }
- X else {
- X s = ex_soundex((key_t *) dbm_key.dptr);
- X printf("%s\n", mk_word(dbm_content.dptr, dbm_content.dsize, s));
- X }
- X dbm_key = NEXTKEY(dbm_key);
- X }
- X DBMCLOSE();
- X}
- X
- X/*
- X * When words are deleted they must be marked as such rather than deleted
- X * using DELETE(). This is because the sequence of counters must remain
- X * continuous. If DELETE() is used then any entries with the same soundex
- X * but with a larger counter value would not be accessible. This approach
- X * does cost some extra space but if an addition is made to the chain then
- X * a deleted counter slot will be reused. Also, the storage used by the word
- X * should be made available to dbm. This could be improved somewhat
- X * by actually using DELETE() on the last entry of the chain.
- X */
- Xdeletewords(name)
- Xchar *name;
- X{
- X register int c, ch, len, s;
- X register char *p;
- X char word[MAXWORDLEN + 2];
- X datum dbm_key, dbm_content;
- X
- X if (DBMINIT(name, O_RDWR) == -1)
- X exit(1);
- X
- X while (fgets(word, sizeof(word), stdin) != (char *) NULL) {
- X len = strlen(word);
- X if (word[len - 1] != '\n') {
- X fprintf(stderr, "mksp: Word too long: %s", word);
- X while ((ch = getchar()) != '\n') /* flush rest of line */
- X putc(ch, stderr);
- X putc('\n', stderr);
- X continue;
- X }
- X word[--len] = '\0';
- X if (len > MAXWORDLEN) {
- X fprintf(stderr, "mksp: Word too long: %s\n", word);
- X continue;
- X }
- X
- X if ((s = soundex(word, 3)) == BAD_WORD) {
- X if (vflag)
- X fprintf(stderr, "Bad word: %s\n", word);
- X continue;
- X }
- X
- X c = 0;
- X while (1) {
- X dbm_key.dptr = (char *) key;
- X dbm_key.dsize = KEYSIZE;
- X mk_key(key, s, c);
- X dbm_content = FETCH(dbm_key);
- X if (dbm_content.dptr == NULL) {
- X if (vflag)
- X fprintf(stderr, "Not found: %s\n", word);
- X break;
- X }
- X
- X if (!IS_DELETED(dbm_content)) {
- X p = mk_word(dbm_content.dptr, dbm_content.dsize, s);
- X if (strmatch(word, p) == 0) {
- X /*
- X * Aside:
- X * Since dptr points to static storage it must be reset
- X * if we want to retain the old content (content.dptr=word)
- X * This took a while to determine...
- X * Anyhow, since there is no need to store the old word
- X * we free up the space
- X */
- X dbm_content.dptr = "";
- X dbm_content.dsize = 0;
- X if (REPLACE(dbm_key, dbm_content) == -1)
- X fprintf(stderr, "mksp: delete of '%s' failed\n", word);
- X else if (vflag) {
- X if (vflag > 1)
- X fprintf(stderr, "%d. %s ", c, p);
- X fprintf(stderr, "deleted\n");
- X }
- X break;
- X }
- X else if (vflag > 1)
- X fprintf(stderr, "%d. %s\n", c, p);
- X }
- X else if (vflag > 1)
- X fprintf(stderr, "%d. (deleted)\n", c);
- X
- X if (++c > MAXCOUNT) {
- X ch = isupper(word[0]) ? tolower(word[0]) : word[0];
- X fprintf(stderr, "mksp: Counter overflow\n");
- X fprintf(stderr, "soundex: %c%d\n", ch, b10(digit_part));
- X exit(1);
- X }
- X }
- X }
- X DBMCLOSE();
- X}
- X
- X/*
- X * Setup the dictionary files if necessary
- X */
- Xsetup(name)
- Xchar *name;
- X{
- X register int s1, s2;
- X
- X s1 = check_dict(name, ".dir");
- X s2 = check_dict(name, ".pag");
- X if (s1 == NEW_DICT && s2 == NEW_DICT)
- X return(NEW_DICT);
- X return(OLD_DICT);
- X}
- X
- X/*
- X * Check if a dictionary file exists:
- X * - if not, try to create it
- X * - if so, see if it is empty
- X * Return NEW_DICT if an empty file exists,
- X * OLD_DICT if a non-empty file exists
- X * Default mode for new files is 0666
- X */
- Xcheck_dict(name, ext)
- Xchar *name, *ext;
- X{
- X register int len, s;
- X char *filename;
- X struct stat statbuf;
- X char *malloc();
- X extern int errno;
- X
- X len = strlen(name) + strlen(ext) + 1;
- X filename = (char *) malloc((unsigned) len);
- X if (filename == (char *) NULL) {
- X fprintf(stderr, "mksp: Can't malloc '%s.%s'\n", name, ext);
- X exit(1);
- X }
- X sprintf(filename, "%s%s", name, ext);
- X if (stat(filename, &statbuf) == -1) {
- X if (errno != ENOENT) {
- X perror("mksp");
- X exit(1);
- X }
- X if (creat(filename, 0666) == -1) {
- X perror("mksp");
- X exit(1);
- X }
- X s = NEW_DICT;
- X }
- X else {
- X if (statbuf.st_size == 0)
- X s = NEW_DICT;
- X else
- X s = OLD_DICT;
- X }
- X return(s);
- X}
- X
- X/*
- X * Compute an 'n' digit Soundex code for 'word'
- X * As a side effect, leave the digit part of the soundex in digit_part
- X *
- X * Since the soundex can be considered a base 7 number, if 'n' is:
- X * 3 require 9 (10 if base 10) bits for digits
- X * 4 require 12 (13) bits
- X * 5 require 15 (17) bits
- X * 6 require 17 (20) bits
- X *
- X * The three slightly different versions of this routine should be coalesced.
- X */
- Xsoundex(word, n)
- Xregister char *word;
- Xint n;
- X{
- X register int c, soundex_length, previous_code;
- X register char *p, *w;
- X char wcopy[MAXWORDLEN + 2];
- X
- X if (!IS_VALID(word))
- X return(-1);
- X
- X strcpy(wcopy, word);
- X p = w = wcopy;
- X
- X while (*p != '\0') {
- X if (isupper(*p))
- X *p = tolower(*p);
- X p++;
- X }
- X
- X digit_part = 0;
- X soundex_length = 0;
- X previous_code = soundex_code_map[*w - 'a'];
- X for (p = w + 1; *p != '\0' && soundex_length < n; p++) {
- X if (!isalpha(*p))
- X continue;
- X c = soundex_code_map[*p - 'a'];
- X if (c == 0 || previous_code == c) {
- X previous_code = c;
- X continue;
- X }
- X digit_part = digit_part * 7 + c;
- X previous_code = c;
- X soundex_length++;
- X }
- X while (soundex_length++ < n)
- X digit_part *= 7;
- X return((digit_part << 5) + *w - 'a');
- X}
- X
- Xb10(n)
- Xint n;
- X{
- X register int b10, s;
- X
- X for (b10 = 0, s = 1; n != 0; n /= 7) {
- X b10 += (n % 7) * s;
- X s *= 10;
- X }
- X return(b10);
- X}
- X
- @//E*O*F mksp.c//
- if test 12797 -ne "`wc -c <'mksp.c'`"; then
- echo shar: error transmitting "'mksp.c'" '(should have been 12797 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'sp.1'" '(5932 characters)'
- if test -f 'sp.1' ; then
- echo shar: will not over-write existing file "'sp.1'"
- else
- sed 's/^X//' >sp.1 <<'@//E*O*F sp.1//'
- X.TH SP 1-LOCAL "11 December 1986"
- X.UC 4
- X.SH NAME
- Xsp \- give possible spellings
- X.br
- Xmksp \- maintain sp dictionaries
- X.br
- Xcalcsoundex \- calculate soundex values
- X.SH SYNOPSIS
- X.B sp
- X[
- X.B -vace
- X] [
- X.B -f dictionary-list
- X] [
- X.B word ...
- X]
- X.br
- X.B mksp
- X[
- X.B -adt
- X] [
- X.B -v#
- X]
- X.B dictionary
- X.br
- X.B calcsoundex
- X[
- X-v
- X]
- X.SH DESCRIPTION
- X.I Sp
- Xtakes one or more words as input and for each word prints
- Xa list of possible spellings.
- XIf the words are not given on the command line, the program prompts and reads
- Xfrom the standard input.
- XYou must know the first letter of the word.
- XUpper case is mapped to lower case.
- XWords must start with an alphabetic character, but any subsequent
- Xcharacters need not be alphabetic (but see the limitation below on words that
- Xmay appear in the dectionary).
- XBlanks are allowed within a word.
- X.PP
- XUp to ten dictionaries previously created by
- X.I mksp
- Xmay be specified by a command line argument and an environment variable.
- XThe name of a dictionary is specified by a pathname,
- Xnot including the suffix (.dir or .pag).
- XA list of dictionaries consists of one or more colon separated dictionary
- Xnames.
- XThe environment variable SPPATH may be set to a dictionary list.
- XIf a command line dictionary list is given in addition to the SPPATH variable,
- Xall dictionaries are used.
- XIf no dictionaries are specified the program looks for default dictionaries.
- X.PP
- XTo reduce the size of the word list, certain heuristics are used.
- XNormally, all words
- X.I sp
- Xconsiders to be a "satisfactory" match are printed.
- XThe \fB-c\fR option causes only close
- Xmatches or an exact match to be printed.
- XThe \fB-e\fR option only prints exact matches.
- XThe \fB-a\fR option causes all words matched to be printed.
- XThe output is sorted alphabetically and indicators are printed beside each
- Xword:
- X.sp 2
- X.in +0.5i
- X.nf
- X.na
- X X == exact match
- X.br
- X ! == close match
- X.br
- X * == good match
- X.br
- X ' ' == matched
- X.in -0.5i
- X.fi
- X.ad
- X.sp 2
- XDuplicated words are not removed from the listing.
- X.PP
- XIf the \fB-v\fR flag is given,
- X.I sp
- Xbecomes verbose.
- X.PP
- X.I Mksp
- Xis a program to maintain dictionaries for use with
- X.I sp.
- XThe \fB-a\fR option is used to create a new dictionary or to add
- Xwords to an existing dictionary.
- XThe words to be put in the dictionary are read from the standard input,
- Xone per line.
- XThe \fB-v\fR flag (which may immediately be followed by an optional number)
- Xcauses some information to be printed as words are processed.
- XA non-flag argument to
- X.I mksp
- Xis assumed to be the prefix of the name of the dictionary files.
- XThe dictionary consists of two files, one with a ".dir" suffix and one with
- Xa ".pag" suffix (see \fBdbm\fR(3X)).
- XIf these files do not exist,
- X.I mksp
- Xwill create both.
- XThe words need not be sorted.
- XThere should not be duplicates in the word list when creating a new dictionary
- Xbut when words are added to an existing dictionary, duplicates are ignored.
- XUpper case letters are stored in the dictionary but
- X.I mksp
- Xmaps upper case to lower case when checking for duplicate words.
- X.I Sp
- Xis case insensitive when searching the database.
- XThe first character of a word must be alphabetic and
- Xthe second character (if present) must be either alphabetic, a single quote,
- Xan ampersand, a period, or a blank.
- XThere is no restriction on the third and subsequent characters.
- X.PP
- XThe \fB-d\fR option is used to delete words from the specified dictionary.
- XThe words are read from the standard input.
- XIf a word is not found in the dictionary, no message is printed.
- XThe comparison is case insensitive.
- X.PP
- XThe \fB-t\fR option prints the contents of the specified dictionary.
- XThe words are not sorted.
- X.PP
- X.I Calcsoundex
- Xreads words from the standard input (one per line) and prints the soundex
- Xcode corresponding to each word on stdout.
- XWith the \fB-v\fR flag,
- X.I calcsoundex
- Xalso echoes each word.
- X.SH EXAMPLE
- X.in +0.5i
- X.na
- X.nf
- X% mksp -a mydictionary
- Xaardvark
- Xprecipitation
- X<more words>
- Xzyzz
- X<^D>
- X% sp -c -f mydictionary:herdictionary
- XWord? propogate
- X! 1. perfect
- X! 2. perfectible
- X<more words>
- X! 7. propagate
- X! 8. proposition
- X<more words>
- XWord? <^D>
- X.fi
- X.ad
- X.in -0.5i
- X.SH FILES
- X.nf
- X.na
- X<dictionary.pag>, <dictionary.dir> dictionary data base
- X/usr/local/lib/sp.dict.[12] default dictionaries
- X.fi
- X.ad
- X.SH LIMITATIONS
- XNo more than 10 dictionaries may be specified.
- XThe maximum length of a word is 50 characters.
- XThe program can return up to 400 matches taking up a maximum of 20480 bytes.
- XThe limitation on what the second character of a word can be is due to
- Xthe algorithm used to compress the dictionaries; there is some room in the
- Xdata structure for a few other characters to become valid.
- XDbm doesn't work between a Sun and VAX across NFS so you can't share a
- Xdictionary between a VAX and a Sun.
- X.PP
- XThere is a limit on the number of words having the same soundex code that can
- Xappear in a single dictionary.
- XThis value should be at least 256 (on a VAX/Sun it is 1024), but it depends
- Xon how the program has been configured locally.
- XIf you come up against this limitation you can split your dictionary; e.g.,
- Xextract every second word from the big dictionary to make a new dictionary,
- Xthen delete the words from the big dictionary.
- XThe following pipeline can be used to determine the number of times each
- Xsoundex code appears in a list of words (one per line):
- X.sp 2
- X.ti +0.5i
- Xcalcsoundex | sort | uniq -c | sort -r -0n
- X.SH "SEE ALSO"
- Xspell(1), uniq(1), dbm(3X), ndbm(3)
- X.br
- XDonald E. Knuth, The Art of Computer Programming, Vol. 3,
- XSorting and Searching, Addison-Wesley, pp. 391-392, 1973.
- X.SH AUTHOR
- XBarry Brachman
- X.br
- XDept. of Computer Science
- X.br
- XUniversity of British Columbia
- X.SH BUGS
- XYou may not agree on what constitutes a match.
- XYou are likely to have to create your own dictionary as the UNIX
- Xdictionary is far from complete. In particular, the suffixes
- Xhave been removed from most words.
- XThe limitations mentioned above are arbitrary.
- XThe limitation on the second character of a word is disgusting.
- X
- @//E*O*F sp.1//
- if test 5932 -ne "`wc -c <'sp.1'`"; then
- echo shar: error transmitting "'sp.1'" '(should have been 5932 characters)'
- fi
- fi # end of overwriting check
- echo shar: extracting "'MANIFEST'" '(1093 characters)'
- if test -f 'MANIFEST' ; then
- echo shar: will not over-write existing file "'MANIFEST'"
- else
- sed 's/^X//' >MANIFEST <<'@//E*O*F MANIFEST//'
- X File Name Kit # Description
- X-----------------------------------------------------------
- X MANIFEST 1 This shipping list
- X Makefile 1 Makefile for use with old dbm routines and source
- X Makefile.newdbm 1 Makefile for 4.3BSD dbm, dbmclose(), or old dbm w/o source
- X README 1 Configuration instructions, etc.
- X calcsoundex.c 1 Program to calculate soundex codes
- X dbm.bug 1 A bug report for the old dbm routines
- X dbm.diffs 1 Diffs to be applied to old dbm routines
- X dbmstuff.c 1 Interface to old and new dbm routines
- X misc.c 1 Miscellaneous support routines
- X mksp.c 1 Program to maintain dictionaries
- X sp.1 1 Man page for sp/mksp/calcsoundex
- X sp.9 2 Man page for EMACS interface to sp
- X sp.c 2 Program to search dictionaries
- X sp.h 2 Header file
- X sp.ml 2 Mlisp code for EMACS interface to sp
- @//E*O*F MANIFEST//
- if test 1093 -ne "`wc -c <'MANIFEST'`"; then
- echo shar: error transmitting "'MANIFEST'" '(should have been 1093 characters)'
- fi
- fi # end of overwriting check
- echo shar: "End of archive 1 (of 2)."
- cp /dev/null ark1isdone
- DONE=true
- for I in 1 2; do
- if test ! -f ark${I}isdone; then
- echo "You still need to run archive ${I}."
- DONE=false
- fi
- done
- case $DONE in
- true)
- echo "You have run both archives."
- echo 'See the README'
- ;;
- esac
- ## End of shell archive.
- exit 0
-